Schenectady County
FAITH: A Framework for Assessing Intrinsic Tabular Hallucinations in Finance
Zhang, Mengao, Fu, Jiayu, Warrier, Tanya, Wang, Yuwen, Tan, Tianhui, Huang, Ke-wei
Hallucination remains a critical challenge for deploying Large Language Models (LLMs) in finance. Accurate extraction and precise calculation from tabular data are essential for reliable financial analysis, since even minor numerical errors can undermine decision-making and regulatory compliance. Financial applications have unique requirements, often relying on context-dependent, numerical, and proprietary tabular data that existing hallucination benchmarks rarely capture. In this study, we develop a rigorous and scalable framework for evaluating intrinsic hallucinations in financial LLMs, conceptualized as a context-aware masked span prediction task over real-world financial documents. Our main contributions are: (1) a novel, automated dataset creation paradigm using a masking strategy; (2) a new hallucination evaluation dataset derived from S&P 500 annual reports; and (3) a comprehensive evaluation of intrinsic hallucination patterns in state-of-the-art LLMs on financial tabular data. Our work provides a robust methodology for in-house LLM evaluation and serves as a critical step toward building more trustworthy and reliable financial Generative AI systems.
- Asia > Singapore > Central Region > Singapore (0.06)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York > Schenectady County > Schenectady (0.04)
- (4 more...)
- Banking & Finance > Trading (0.48)
- Banking & Finance > Financial Services (0.46)
- Information Technology > Software (0.34)
- South America (0.05)
- North America > United States > New York > Schenectady County (0.05)
- North America > United States > Minnesota (0.05)
- (4 more...)
NoteBar: An AI-Assisted Note-Taking System for Personal Knowledge Management
Wisoff, Josh, Tang, Yao, Fang, Zhengyu, Guzman, Jordan, Wang, YuTang, Yu, Alex
Note-taking is a critical practice for capturing, organizing, and reflecting on information in both academic and professional settings. The recent success of large language models has accelerated the development of AI-assisted tools, yet existing solutions often struggle with efficiency. We present NoteBar, an AI-assisted note-taking tool that leverages persona information and efficient language models to automatically organize notes into multiple categories and better support user workflows. To support research and evaluation in this space, we further introduce a novel persona-conditioned dataset of 3,173 notes and 8,494 annotated concepts across 16 MBTI personas, offering both diversity and semantic richness for downstream tasks. Finally, we demonstrate that NoteBar can be deployed in a practical and cost-effective manner, enabling interactive use without reliance on heavy infrastructure. Together, NoteBar and its accompanying dataset provide a scalable and extensible foundation for advancing AI-assisted personal knowledge management.
- North America > United States > New York > Monroe County > Rochester (0.05)
- North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
- North America > United States > New York > Schenectady County > Schenectady (0.04)
- (3 more...)
ExoMiner++ on TESS with Transfer Learning from Kepler: Transit Classification and Vetting Catalog for 2-min Data
Valizadegan, Hamed, Martinho, Miguel J. S., Jenkins, Jon M., Twicken, Joseph D., Caldwell, Douglas A., Maynard, Patrick, Wei, Hongbo, Zhong, William, Yates, Charles, Donald, Sam, Collins, Karen A., Latham, David, Barkaoui, Khalid, Berlind, Perry, Calkins, Michael L., Carden, Kylee, Chazov, Nikita, Esquerdo, Gilbert A., Guillot, Tristan, Krushinsky, Vadim, Nowak, Grzegorz, Rackham, Benjamin V., Triaud, Amaury, Schwarz, Richard P., Stephens, Denise, Stockdale, Chris, Wang, Jiaqi, Watkins, Cristilyn N., Wilkin, Francis P.
We present ExoMiner++, an enhanced deep learning model that builds on the success of ExoMiner to improve transit signal classification in 2-minute TESS data. ExoMiner++ incorporates additional diagnostic inputs, including periodogram, flux trend, difference image, unfolded flux, and spacecraft attitude control data, all of which are crucial for effectively distinguishing transit signals from more challenging sources of false positives. To further enhance performance, we leverage transfer learning from high-quality labeled data from the Kepler space telescope, mitigating the impact of TESS's noisier and more ambiguous labels. ExoMiner++ achieves high accuracy across various classification and ranking metrics, significantly narrowing the search space for follow-up investigations to confirm new planets. To serve the exoplanet community, we introduce new TESS catalogs containing ExoMiner++ classifications and confidence scores for each transit signal. Among the 147,568 unlabeled TCEs, ExoMiner++ identifies 7,330 as planet candidates, with the remainder classified as false positives. These 7,330 planet candidates correspond to 1,868 existing TESS Objects of Interest (TOIs), 69 Community TESS Objects of Interest (CTOIs), and 50 newly introduced CTOIs. 1,797 out of the 2,506 TOIs previously labeled as planet candidates in ExoFOP are classified as planet candidates by ExoMiner++. This reduction in plausible candidates combined with the excellent ranking quality of ExoMiner++ allows the follow-up efforts to be focused on the most likely candidates, increasing the overall planet yield.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Oceania > Australia (0.04)
- (17 more...)
- Research Report > New Finding (0.45)
- Research Report > Experimental Study (0.45)
- Government > Space Agency (0.68)
- Government > Regional Government > North America Government > United States Government (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis
Zhou, Fangshuo, Li, Huaxia, Hu, Rui, Wu, Sensen, Feng, Hailin, Du, Zhenhong, Xu, Liuchang
Volunteer Geographic Information (VGI), with its rich variety, large volume, rapid updates, and diverse sources, has become a critical source of geospatial data. However, VGI data from platforms like OSM exhibit significant quality heterogeneity across different data types, particularly with urban building data. To address this, we propose a multi-source geographic data transformation solution, utilizing accessible and complete VGI data to assist in generating urban building footprint data. We also employ a multimodal data generation framework to improve accuracy. First, we introduce a pipeline for constructing an 'image-text-metadata-building footprint' dataset, primarily based on road network data and supplemented by other multimodal data. We then present ControlCity, a geographic data transformation method based on a multimodal diffusion model. This method first uses a pre-trained text-to-image model to align text, metadata, and building footprint data. An improved ControlNet further integrates road network and land-use imagery, producing refined building footprint data. Experiments across 22 global cities demonstrate that ControlCity successfully simulates real urban building patterns, achieving state-of-the-art performance. Specifically, our method achieves an average FID score of 50.94, reducing error by 71.01% compared to leading methods, and a MIoU score of 0.36, an improvement of 38.46%. Additionally, our model excels in tasks like urban morphology transfer, zero-shot city generation, and spatial data completeness assessment. In the zero-shot city task, our method accurately predicts and generates similar urban structures, demonstrating strong generalization. This study confirms the effectiveness of our approach in generating urban building footprint data and capturing complex city characteristics.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Conjunctive categorial grammars and Lambek grammars with additives
Kuznetsov, Stepan L., Okhotin, Alexander
A new family of categorial grammars is proposed, defined by enriching basic categorial grammars with a conjunction operation. It is proved that the formalism obtained in this way has the same expressive power as conjunctive grammars, that is, context-free grammars enhanced with conjunction. It is also shown that categorial grammars with conjunction can be naturally embedded into the Lambek calculus with conjunction and disjunction operations. This further implies that a certain NP-complete set can be defined in the Lambek calculus with conjunction. We also show how to handle some subtle issues connected with the empty string. Finally, we prove that a language generated by a conjunctive grammar can be described by a Lambek grammar with disjunction (but without conjunction).
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (4 more...)
Cutsets and EF1 Fair Division of Graphs
Chen, Jiehua, Zwicker, William S.
In fair division of a connected graph $G = (V, E)$, each of $n$ agents receives a share of $G$'s vertex set $V$. These shares partition $V$, with each share required to induce a connected subgraph. Agents use their own valuation functions to determine the non-negative numerical values of the shares, which determine whether the allocation is fair in some specified sense. We introduce forbidden substructures called graph cutsets, which block divisions that are fair in the EF1 (envy-free up to one item) sense by cutting the graph into "too many pieces". Two parameters - gap and valence - determine blocked values of $n$. If $G$ guarantees connected EF1 allocations for $n$ agents with valuations that are CA (common and additive), then $G$ contains no elementary cutset of gap $k \ge 2$ and valence in the interval $\[n - k + 1, n - 1\]$. If $G$ guarantees connected EF1 allocations for $n$ agents with valuations in the broader CM (common and monotone) class, then $G$ contains no cutset of gap $k \ge 2$ and valence in the interval $\[n - k + 1, n - 1\]$. These results rule out the existence of connected EF1 allocations in a variety of situations. For some graphs $G$ we can, with help from some new positive results, pin down $G$'s spectrum - the list of exactly which values of $n$ do/do not guarantee connected EF1 allocations. Examples suggest a conjectured common spectral pattern for all graphs. Further, we show that it is NP-hard to determine whether a graph admits a cutset. We also provide an example of a (non-traceable) graph on eight vertices that has no cutsets of gap $\ge 2$ at all, yet fails to guarantee connected EF1 allocations for three agents with CA preferences.
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > Schenectady County > Schenectady (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
LF-VISLAM: A SLAM Framework for Large Field-of-View Cameras with Negative Imaging Plane on Mobile Agents
Wang, Ze, Yang, Kailun, Shi, Hao, Li, Peng, Gao, Fei, Bai, Jian, Wang, Kaiwei
Simultaneous Localization And Mapping (SLAM) has become a crucial aspect in the fields of autonomous driving and robotics. One crucial component of visual SLAM is the Field-of-View (FoV) of the camera, as a larger FoV allows for a wider range of surrounding elements and features to be perceived. However, when the FoV of the camera reaches the negative half-plane, traditional methods for representing image feature points using [u,v,1]^T become ineffective. While the panoramic FoV is advantageous for loop closure, its benefits are not easily realized under large-attitude-angle differences where loop-closure frames cannot be easily matched by existing methods. As loop closure on wide-FoV panoramic data further comes with a large number of outliers, traditional outlier rejection methods are not directly applicable. To address these issues, we propose LF-VISLAM, a Visual Inertial SLAM framework for cameras with extremely Large FoV with loop closure. A three-dimensional vector with unit length is introduced to effectively represent feature points even on the negative half-plane. The attitude information of the SLAM system is leveraged to guide the feature point detection of the loop closure. Additionally, a new outlier rejection method based on the unit length representation is integrated into the loop closure module. We collect the PALVIO dataset using a Panoramic Annular Lens (PAL) system with an entire FoV of 360{\deg}x(40{\deg}~120{\deg}) and an Inertial Measurement Unit (IMU) for Visual Inertial Odometry (VIO) to address the lack of panoramic SLAM datasets. Experiments on the established PALVIO and public datasets show that the proposed LF-VISLAM outperforms state-of-the-art SLAM methods. Our code will be open-sourced at https://github.com/flysoaryun/LF-VISLAM.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- North America > United States > New York > Schenectady County > Schenectady (0.04)
- Transportation (0.34)
- Information Technology (0.34)
Efficiently-Verifiable Strong Uniquely Solvable Puzzles and Matrix Multiplication
We advance the Cohn-Umans framework for developing fast matrix multiplication algorithms. We introduce, analyze, and search for a new subclass of strong uniquely solvable puzzles (SUSP), which we call simplifiable SUSPs. We show that these puzzles are efficiently verifiable, which remains an open question for general SUSPs. We also show that individual simplifiable SUSPs can achieve the same strength of bounds on the matrix multiplication exponent $\omega$ that infinite families of SUSPs can. We report on the construction, by computer search, of larger SUSPs than previously known for small width. This, combined with our tighter analysis, strengthens the upper bound on the matrix multiplication exponent from $2.66$ to $2.505$ obtainable via this computational approach, and nears the results of the handcrafted constructions of Cohn et al.
- North America > United States > Virginia (0.04)
- North America > United States > New York > Schenectady County > Schenectady (0.04)
- Europe > Germany (0.04)
- Asia > Middle East > Jordan (0.04)
AI/ML, Data Science Jobs #hiring
General Electric Company is an American multinational conglomerate incorporated in New York City and headquartered in Boston. As of 2018, the company operates through the following segments: aviation, healthcare, power, renewable energy, digital industry, additive manufacturing and venture capital and finance. If you have forgotten your password you can reset it here.
- Asia > India > Karnataka > Bengaluru (0.11)
- North America > United States > Massachusetts > Middlesex County > Marlborough (0.09)
- North America > United States > New York > Schenectady County > Schenectady (0.07)
- (5 more...)
- Industrial Conglomerates (0.92)
- Energy (0.92)
- Banking & Finance (0.57)